Genshin Impact is a free-to-play game that entails the genres of action, Role-Playing (RPG) and Open-World. It is also a 'gacha' game: where in-game currency is used to test one's luck in obtaining a special rated character that helps improve their account. The game was created by MiHoYo (now Hoyoverse) from Shanghai, China.
The game is currently available on iOS, Android for mobile, several compatible PC units, and the PlayStation 4 and 5 with a release due on the Nintendo Switch sometime in 2022.
As stated on Wikipedia: "Prior to its release the game had over 10 million registrations, with over half of that from outside China. According to some, the game was the biggest international release of any Chinese video game. In the lead up to release, the game won the Tokyo Game Show Media Awards 2020 public poll, ranking first among 14 other games."
Genshin Impact grew rapidly since its launch, taking the game industry by storm one may say, and making a name for itself around the world.
The game generates revenue through its limited event wish durations (character banners as we call them) and has recently started to allow the option of "buying skins" or alternate outfits for playable characters. It also has a Battle Pass system in place where players can complete challenges or objectives over the course of 6 weeks for rewards.
In this project, we will attempt to look at and analyze trends between various factors that go into play when looking at the success of a promotional character.
As seen in the graphic above, Genshin Impact is estimated to have the highest first year gaming revenue in recent times, slightly edging past Fortnite in the Earnings Estimate. To the average person it may look like "an anime game" yet it also surpasses "mainstream" gaming such as GTA V and Call of Duty: Modern Warfare when it comes to first-year sales. This is indicative of how strong the launch of this game was.
As an outsider to this game, it will be an interesting experience to gain insight on how the gacha system works on Genshin Impact and some potential factors behind the success of a promotional character while also getting to see some clips that we have recorded of gameplay to maybe loop you in!
If you are a player, you may get to learn how some others make their in-game decisions and their rationale behind it.
Genshin Impact uses a summon system as previously mentioned to obtain promotional, limited time characters.
All characters in Genshin Impact have a rating of either "4-star" or "5-star". Promotional characters belong to the 5-star or 5* category. Barring the five characters Jean, Qiqi, Mona, Keqing and Diluc plus the Main Protagonist (Aether if Male or Lumine if Female), every character is limited and are available for a 3-week period only. Weapons also follow a similar categorization of 4* and 5* but they have a slightly different summon system despite using the same summon currency as the promotional 5* character.
If someone says "I'm rolling this banner" it means they are using their summon currency for the promotional character or weapon of this patch.
The summons are typically called "Wishes" or "Rolls" (in-game currency name would be Intertwined Fate or Acquaint Fate) and the number of rolls it takes to get the promotional character is called "Pity". For example, if someone says "I'm at 67 pity", they mean that they've used 67 wishes since their last 5* summon.
Click here for more detailed information on the Genshin Summon System.
Get a feel for how gameplay looks here: https://www.youtube.com/embed/XCzgt03R5sE
Many Genshin players use this website called paimon.moe in order to keep track of their Genshin Wish History. Paimon.moe is generally among the most trusted third party sources to store Genshin Wish History since Hoyoverse deletes any records that are 6 months or older. Thus, as it is known for being very organized and informative, many users use it!
In addition, there are no public Genshin data for wishes, so it is necessary for users to make their own if they want to see the data. For this tutorial, we'll be using data obtained from here, give you the opportunity to send data our way to process, and use it for various analyses.
The first step is to collect roll data through paimon.moe as an Excel Spreadsheet. You can easily follow the instructions to gather the data and compile it into an Excel Sheet by clicking the link in the above paragraph.
Afterwards, we want to insert all the Excel Sheets into the workspace. To start off, we will need to import the libraries needed in order to complete the investigation and analysis on a Jupyter notebook. We then use the pandas library in order to read each excel file to create a collective wish datachart. The pandas library is a very useful software library known for data manipulation and analysis.
In addition, We removed any dataset that occurred before 11/2/2021 to limit a certain amount of banners/characters on our datachart. This is because the data for the banners (3 week period) before is somewhat incomplete. Hoyoverse deletes any records that are 6 months or older as previously stated. We are accounting for this to streamline our dataset.
# all our imports
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn import datasets, svm
import statsmodels.formula.api as sms
from statsmodels.formula.api import ols
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, cross_val_predict, cross_val_score
import scipy.stats as stats
import statsmodels.formula.api as sm
import toolz
import numpy as np
!pip3 install requests beautifulsoup4
import requests
from bs4 import BeautifulSoup as bs4
import toolz
import matplotlib.pyplot as plt
#combines extracted datasets from paimon.moe to create one big dataset
d1 = pd.read_excel('paimonmoe_wish_history.xlsx')
data1 = pd.DataFrame(d1, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d2 = pd.read_excel('paimonmoe_wish_history1.xlsx')
data2 = pd.DataFrame(d2, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d3 = pd.read_excel('paimonmoe_wish_history2.xlsx')
data3 = pd.DataFrame(d3, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d4 = pd.read_excel('paimonmoe_wish_history3.xlsx')
data4 = pd.DataFrame(d4, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d5 = pd.read_excel('paimonmoe_wish_history4.xlsx')
data5 = pd.DataFrame(d5, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d6 = pd.read_excel('paimonmoe_wish_history5.xlsx')
data6 = pd.DataFrame(d6, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d7 = pd.read_excel('paimonmoe_wish_history6.xlsx')
data7 = pd.DataFrame(d7, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d8 = pd.read_excel('paimonmoe_wish_history7.xlsx')
data8 = pd.DataFrame(d8, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d9 = pd.read_excel('paimonmoe_wish_history8.xlsx')
data9 = pd.DataFrame(d9, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d10 = pd.read_excel('paimonmoe_wish_history9.xlsx')
data10 = pd.DataFrame(d10, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d11 = pd.read_excel('paimonmoe_wish_history10.xlsx')
data11 = pd.DataFrame(d11, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
dataset = pd.concat([data1, data2, data3, data4, data5, data6, data7, data8, data9, data10, data11])
dataset = dataset.loc[dataset['Time'] > '2021-11-02 00:00:00']
dataset = dataset.drop(['Group'], axis = 1)
dataset
Requirement already satisfied: requests in /opt/conda/lib/python3.9/site-packages (2.27.1) Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.9/site-packages (4.10.0) Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.9/site-packages (from requests) (2.0.10) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.9/site-packages (from requests) (1.26.8) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.9/site-packages (from requests) (2021.10.8) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.9/site-packages (from requests) (3.3) Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.9/site-packages (from beautifulsoup4) (2.3.1)
| Type | Name | Time | ⭐ | Pity | #Roll | Banner | |
|---|---|---|---|---|---|---|---|
| 0 | Weapon | Ferrous Shadow | 2021-11-12 12:15:27 | 3 | 1 | 1 | Moment of Bloom |
| 1 | Character | Sayu | 2021-11-16 12:06:50 | 4 | 2 | 2 | Moment of Bloom |
| 2 | Weapon | Emerald Orb | 2021-11-16 12:47:55 | 3 | 1 | 3 | Moment of Bloom |
| 3 | Weapon | Debate Club | 2021-11-16 12:57:23 | 3 | 1 | 4 | Moment of Bloom |
| 4 | Weapon | Ferrous Shadow | 2021-11-16 13:13:06 | 3 | 1 | 5 | Moment of Bloom |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 790 | Weapon | Skyrider Sword | 2022-05-11 07:01:08 | 3 | 1 | 84 | The Herons Court |
| 791 | Character | Razor | 2022-05-11 07:01:14 | 4 | 5 | 85 | The Herons Court |
| 792 | Weapon | Cool Steel | 2022-05-11 07:01:54 | 3 | 1 | 86 | The Herons Court |
| 793 | Weapon | Cool Steel | 2022-05-12 12:06:43 | 3 | 1 | 87 | The Herons Court |
| 794 | Weapon | Slingshot | 2022-05-14 15:27:28 | 3 | 1 | 88 | The Herons Court |
4379 rows × 7 columns
Next, we made a dictionary in order to count the amount of times each banner is rolled in the datachart given. Then we would make a dataframe of the featured character and the total amount of rolls for their banner. After the dataset is made, we plotted a bar graph with the character on the x-axis and the # of wishes on the y-axis. This gives us an idea of how each banner performed in our data. Naturally, more rolls on a banner, the better the banner is since more rolls are willing to be sacrificed on it.
#plots the amount of rolls per character in our dataset via bar graph
import matplotlib.pyplot as plt
counter = {}
for c in dataset['Banner']:
if c not in counter:
counter[c] = 0
counter[c] += 1
fig = plt.figure(figsize=(22, 15))
ax = fig.add_axes([0,0,1,1])
characters = ['Hu Tao', 'Eula', 'Albedo', 'Arataki Itto', 'Shenhe', 'Xiao', 'Zhongli', 'Ganyu', 'Yae Miko', 'Raiden Shogun',
'Sangonomiya Kokomi', 'Kamisato Ayato', 'Venti', 'Kamisato Ayaka']
count = [227, 425, 113, 561, 413, 102, 63, 531, 431, 336, 52, 498, 49, 578]
ax.bar(characters, count)
ax.set_ylabel('# of Rolls on the Banner')
ax.set_xlabel('Character')
ax.set_title('Total Number of Rolls per Banner with a Given Character in Genshin')
plt.show()
We are going to explore trends between the following:
We will start looking at factors that go into rolls starting with the seiyuus (voice actors/actresses) behind the characters.
We used MyAnimeList's Top 150 entries for web scraping and for the seiyuus who did not make the top 150, we manually keyed in the data. There were some hard ones to find entires for such as Arataki Itto's seiyuu who only has a music entry of his own on MAL which we substituted in for his ratings.
We'll implement this using Beautiful Soup since it was the easiest way to handle it.
#Web Scrape data from MyAnimeList.net's top 150 voice actors/actresses using Beautiful Soup
#Rating is measured by number of people who have "Favorited" that voice actor/actress.
#Final data is stored in a pandas dataframe called seiyuu_table_final
url = 'https://myanimelist.net/people.php'
seiyuu_data = requests.get(url)
soup = bs4(seiyuu_data.content,'html.parser')
table = soup.find('table')
#each page has 50 entries, so we will repeat the process 3 times for 3*50 = 150 table entries
seiyuu_table = pd.DataFrame(columns = ['rank', 'name', 'birthday','favorites'], index = range(0,50))
row_marker = 0
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
row_marker += 1
for column in columns:
if (row_marker > 1):
seiyuu_table.iat[row_marker-2,column_marker] = column.get_text()
column_marker += 1
seiyuu_table['rank'] = seiyuu_table['rank'].str.replace('\n','')
seiyuu_table['name'] = seiyuu_table['name'].str.replace('\n','')
seiyuu_table['birthday'] = seiyuu_table['birthday'].str.replace('\n','')
seiyuu_table['favorites'] = seiyuu_table['favorites'].str.replace('\n','')
#Entries 51-100 on MyAnimeList
url2 = 'https://myanimelist.net/people.php?limit=50'
seiyuu_data = requests.get(url2)
soup = bs4(seiyuu_data.content,'html.parser')
table = soup.find('table')
seiyuu_table2 = pd.DataFrame(columns = ['rank', 'name', 'birthday','favorites'], index = range(0,50))
row_marker = 0
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
row_marker += 1
for column in columns:
if (row_marker > 1):
seiyuu_table2.iat[row_marker-2,column_marker] = column.get_text()
column_marker += 1
seiyuu_table2['rank'] = seiyuu_table2['rank'].str.replace('\n','')
seiyuu_table2['name'] = seiyuu_table2['name'].str.replace('\n','')
seiyuu_table2['birthday'] = seiyuu_table2['birthday'].str.replace('\n','')
seiyuu_table2['favorites'] = seiyuu_table2['favorites'].str.replace('\n','')
#Entries 101-150 on MyAnimeList
url3 = 'https://myanimelist.net/people.php?limit=100'
seiyuu_data = requests.get(url3)
soup = bs4(seiyuu_data.content,'html.parser')
table = soup.find('table')
seiyuu_table3 = pd.DataFrame(columns = ['rank', 'name', 'birthday','favorites'], index = range(0,50))
row_marker = 0
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
row_marker += 1
for column in columns:
if (row_marker > 1):
seiyuu_table3.iat[row_marker-2,column_marker] = column.get_text()
column_marker += 1
seiyuu_table3['rank'] = seiyuu_table3['rank'].str.replace('\n','')
seiyuu_table3['name'] = seiyuu_table3['name'].str.replace('\n','')
seiyuu_table3['birthday'] = seiyuu_table3['birthday'].str.replace('\n','')
seiyuu_table3['favorites'] = seiyuu_table3['favorites'].str.replace('\n','')
#Merge the data from all 3 pages from MyAnimeList
seiyuu_table_final = pd.concat([seiyuu_table,seiyuu_table2, seiyuu_table3])
#Remove the Japanese text to preserve English text in the form "LastName, FirstName"
series = pd.Series(seiyuu_table_final['name'])
seiyuu_table_final['name'] = series.str.extract(pat = ('([A-z]+, [A-z]+)'))
seiyuu_table_final
| rank | name | birthday | favorites | |
|---|---|---|---|---|
| 0 | 1 | Kamiya, Hiroshi | Jan 28, 1975 | 102,013 |
| 1 | 2 | Hanazawa, Kana | Feb 25, 1989 | 98,144 |
| 2 | 3 | Miyano, Mamoru | Jun 8, 1983 | 85,033 |
| 3 | 4 | Kaji, Yuuki | Sep 3, 1985 | 70,286 |
| 4 | 5 | Miyazaki, Hayao | Jan 5, 1941 | 66,127 |
| ... | ... | ... | ... | ... |
| 45 | 146 | Koyama, Rikiya | Dec 18, 1963 | 4,114 |
| 46 | 147 | Furukawa, Makoto | Sep 29, 1989 | 4,062 |
| 47 | 148 | Penkin, Kevin | May 22, 1992 | 4,040 |
| 48 | 149 | Morikawa, Toshiyuki | Jan 26, 1967 | 4,030 |
| 49 | 150 | Tezuka, Osamu | Nov 3, 1928 | 4,006 |
150 rows × 4 columns
#Web Scrape Japanese voice acting cast data for Genshin Impact Characters
url4 = 'https://gamewith.net/genshin-impact/article/show/22638'
genshin_seiyuu_data = requests.get(url4)
soup = bs4(genshin_seiyuu_data.content, 'html.parser')
table = soup.find('table')
table
genshin_seiyuu_table = pd.DataFrame(columns = ['Character', 'Seiyuu'], index = range(0,27))
row_marker = 0
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
row_marker += 1
for column in columns:
if (row_marker > 1):
genshin_seiyuu_table.iat[row_marker-2,column_marker] = column.get_text()
column_marker += 1
seiyuu_dict = {'Character':['Hu Tao', 'Eula', 'Albedo', 'Arataki Itto', 'Shenhe', 'Xiao', 'Zhongli', 'Ganyu', 'Yae Miko', 'Raiden Shogun',
'Sangonomiya Kokomi', 'Kamisato Ayato', 'Venti', 'Kamisato Ayaka', 'Yelan'], 'Seiyuu': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
'#MAL Favorites' :['', '', '', '', '', '', '', '', '', '', '', '', '', '', '']}
seiyuu_df = pd.DataFrame(seiyuu_dict)
series = pd.Series(genshin_seiyuu_table['Seiyuu'])
genshin_seiyuu_table['Seiyuu'] = series.str.extract(pat = ('(JP: [A-z]+ [A-z]+)'))
genshin_seiyuu_table['Seiyuu'] = genshin_seiyuu_table['Seiyuu'].str.replace('JP: ', '')
genshin_seiyuu_table['Seiyuu'] = genshin_seiyuu_table['Seiyuu'].str.replace(' ', ', ')
#Format the voice actor/actress names to LastName, FirstName
for index, vas in genshin_seiyuu_table.iterrows():
if vas['Character'] == 'Hu Tao':
seiyuu_df.loc[0, 'Seiyuu'] = vas['Seiyuu']
if vas['Character'] == 'Eula':
seiyuu_df.loc[1,'Seiyuu'] = 'Satou, Rina'
if vas['Character'] == 'Albedo':
seiyuu_df.loc[2,'Seiyuu'] = 'Nojima, Kenji'
if vas['Character'] == 'Itto':
seiyuu_df.loc[3,'Seiyuu'] = vas['Seiyuu']
if vas['Character'] == 'Shenhe':
seiyuu_df.loc[4,'Seiyuu'] = vas['Seiyuu']
if vas['Character'] == 'Xiao':
seiyuu_df.loc[5,'Seiyuu'] = 'Matsuoka, Yoshitsugu'
if vas['Character'] == 'Zhongli':
seiyuu_df.loc[6,'Seiyuu'] = 'Maeno, Tomoaki'
if vas['Character'] == 'Ganyu':
seiyuu_df.loc[7,'Seiyuu'] = vas['Seiyuu']
if vas['Character'] == 'Yae Miko':
seiyuu_df.loc[8,'Seiyuu'] = 'Sakura, Ayane'
if vas['Character'] == 'Raiden Shogun':
seiyuu_df.loc[9, 'Seiyuu'] = 'Sawashiro, Miyuki'
if vas['Character'] == 'Kokomi':
seiyuu_df.loc[10, 'Seiyuu'] = 'Mimori, Suzuko'
if vas['Character'] == 'Ayato':
seiyuu_df.loc[11, 'Seiyuu'] = vas['Seiyuu']
if vas['Character'] == 'Venti':
seiyuu_df.loc[12, 'Seiyuu'] = vas['Seiyuu']
if vas['Character'] == 'Ayaka':
seiyuu_df.loc[13, 'Seiyuu'] = vas['Seiyuu']
seiyuu_df.loc[14, 'Seiyuu'] = 'Sakamoto, Maaya'
seiyuu_df
| Character | Seiyuu | #MAL Favorites | |
|---|---|---|---|
| 0 | Hu Tao | Takahashi, Rie | |
| 1 | Eula | Satou, Rina | |
| 2 | Albedo | Nojima, Kenji | |
| 3 | Arataki Itto | Nishikawa, Takanori | |
| 4 | Shenhe | Kawasumi, Ayako | |
| 5 | Xiao | Matsuoka, Yoshitsugu | |
| 6 | Zhongli | Maeno, Tomoaki | |
| 7 | Ganyu | Ueda, Reina | |
| 8 | Yae Miko | Sakura, Ayane | |
| 9 | Raiden Shogun | Sawashiro, Miyuki | |
| 10 | Sangonomiya Kokomi | Mimori, Suzuko | |
| 11 | Kamisato Ayato | Ishida, Akira | |
| 12 | Venti | Ayumu, Murase | |
| 13 | Kamisato Ayaka | Hayami, Saori | |
| 14 | Yelan | Sakamoto, Maaya |
# Now we will fill in the MAL Favorites Data (some are manual)
# and create a new column to indicate whether a character's seiyuu is in the MyAnimeList Top 150 or not
seiyuu_df['MAL Top 150'] = False
for index, rows in seiyuu_table_final.iterrows():
if(rows[1] == 'Takahashi, Rie'):
seiyuu_df.loc[0, '#MAL Favorites'] = rows[3]
seiyuu_df.loc[0, 'MAL Top 150'] = True
elif (rows[1] == 'Sawashiro, Miyuki'):
seiyuu_df.loc[9, '#MAL Favorites'] = rows[3]
seiyuu_df.loc[9, 'MAL Top 150'] = True
elif(rows[1] == 'Matsuoka, Yoshitsugu'):
seiyuu_df.loc[5,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[5, 'MAL Top 150'] = True
elif(rows[1] == 'Sakura, Ayane'):
seiyuu_df.loc[8,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[8, 'MAL Top 150'] = True
elif(rows[1] == 'Maeno, Tomoaki'):
seiyuu_df.loc[6,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[6, 'MAL Top 150'] = True
elif(rows[1] == 'Sato, Rina'):
seiyuu_df.loc[1,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[1, 'MAL Top 150'] = True
elif(rows[1] == 'Nojima, Kenji'):
seiyuu_df.loc[2,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[2, 'MAL Top 150'] = True
elif(rows[1] == 'Nishikawa, Takanori'):
seiyuu_df.loc[3,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[3, 'MAL Top 150'] = True
elif(rows[1] == 'Kawasumi, Ayako'):
seiyuu_df.loc[4,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[4, 'MAL Top 150'] = True
elif(rows[1] == 'Ueda, Reina'):
seiyuu_df.loc[7,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[7, 'MAL Top 150'] = True
elif(rows[1] == 'Mimori, Suzuko'):
seiyuu_df.loc[10,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[10, 'MAL Top 150'] = True
elif(rows[1] == 'Ishida, Akira'):
seiyuu_df.loc[11,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[11, 'MAL Top 150'] = True
elif(rows[1] == 'Ayumu, Murase'):
seiyuu_df.loc[12,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[12, 'MAL Top 150'] = True
elif(rows[1] == 'Hayami, Saori'):
seiyuu_df.loc[13,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[13, 'MAL Top 150'] = True
elif(rows[1] == 'Sakamoto, Maaya'):
seiyuu_df.loc[14,'#MAL Favorites'] = rows[3]
seiyuu_df.loc[14, 'MAL Top 150'] = True
#Fill in missing data for #MAL favorites on those who are not in the Top 150
seiyuu_df.loc[1,'#MAL Favorites'] = '2,670'
seiyuu_df.loc[7,'#MAL Favorites'] = '3417'
seiyuu_df.loc[2,'#MAL Favorites'] = '965'
seiyuu_df.loc[3,'#MAL Favorites'] = '593' #This is from T.M Revolution which is the only data available about Takanori Nishikawa on MyAnimeList
seiyuu_df.loc[6,'#MAL Favorites'] = '3,662'
seiyuu_df.loc[10,'#MAL Favorites'] = '3,203'
seiyuu_df.loc[12,'#MAL Favorites'] = '5,262'
seiyuu_df
| Character | Seiyuu | #MAL Favorites | MAL Top 150 | |
|---|---|---|---|---|
| 0 | Hu Tao | Takahashi, Rie | 45,055 | True |
| 1 | Eula | Satou, Rina | 2,670 | False |
| 2 | Albedo | Nojima, Kenji | 965 | False |
| 3 | Arataki Itto | Nishikawa, Takanori | 593 | False |
| 4 | Shenhe | Kawasumi, Ayako | 4,470 | True |
| 5 | Xiao | Matsuoka, Yoshitsugu | 37,623 | True |
| 6 | Zhongli | Maeno, Tomoaki | 3,662 | False |
| 7 | Ganyu | Ueda, Reina | 3417 | False |
| 8 | Yae Miko | Sakura, Ayane | 17,141 | True |
| 9 | Raiden Shogun | Sawashiro, Miyuki | 37,907 | True |
| 10 | Sangonomiya Kokomi | Mimori, Suzuko | 3,203 | False |
| 11 | Kamisato Ayato | Ishida, Akira | 11,323 | True |
| 12 | Venti | Ayumu, Murase | 5,262 | False |
| 13 | Kamisato Ayaka | Hayami, Saori | 53,529 | True |
| 14 | Yelan | Sakamoto, Maaya | 16,320 | True |
# Genshin Sales Data from Japan and China
# Japan Data will be manually added from https://game-i.daa.jp/?%E3%82%AC%E3%83%81%E3%83%A3%E5%88%86%E6%9E%90%2F%E5%8E%9F%E7%A5%9E
# Formatting the data via web scraping is difficult due to inaccurate English translations.
# Data in this website represents in billions of yen. For example, 22.71 G is 2.271 billion Yen (due to Japanese language)
# NOTE: This is data for iOS sales only. We are currently unable to access data for PS4,PS5 and PC sales since the company does not disclose
# this data.
sales_JP = {}
# Sales in billions of yen.
sales_JP['TheHeron\'s Court'] = 2.271
sales_JP['Azure Excursion + Ballad in Goblets'] = 3.201
sales_JP['Reign of Serenity + Drifting Luminescence'] = 2.027
sales_JP['Everbloom Violet'] = 1.733
sales_JP['Gentry of Hermitage + Adrift in the Harbor'] = 2.394
sales_JP['Invitation to Mundane Life + The Transcendant One Returns'] = 2.347
sales_JP['Oni\'s Royale'] = 1.071
sales_JP[' Secretum Secretorum + Born of Ocean Swell'] = 1.667
sales_JP['Moment of Bloom'] = 2.468
# ----------------------------------------------
def f(yen):
Yen_to_1USD = 0.0077 #As of 5/15/2022
# we account for the fact that each entry is a certain billion amount of yen by multiplying by 10^9
return int(Yen_to_1USD * (yen * (1000000000)))
#We convert our sales from Japanese Yen to US Dollar for the sake of consistency
sales_JP = toolz.valmap(f,sales_JP)
print('Japan')
sales_JP = pd.DataFrame(sales_JP.items(), columns = ['Banner','Sales Japan (in USD)'])
sales_JP
Japan
| Banner | Sales Japan (in USD) | |
|---|---|---|
| 0 | TheHeron's Court | 17486700 |
| 1 | Azure Excursion + Ballad in Goblets | 24647700 |
| 2 | Reign of Serenity + Drifting Luminescence | 15607900 |
| 3 | Everbloom Violet | 13344100 |
| 4 | Gentry of Hermitage + Adrift in the Harbor | 18433800 |
| 5 | Invitation to Mundane Life + The Transcendant ... | 18071900 |
| 6 | Oni's Royale | 8246700 |
| 7 | Secretum Secretorum + Born of Ocean Swell | 12835900 |
| 8 | Moment of Bloom | 19003600 |
# China Data will be manually added from https://www.genshinlab.com/genshin-impact-revenue-chart/
# This site is most commonly referred to when gauging character sales and is regularly maintained
# by certified people across the Genshin community.
#This data from China is on iOS only.
sales_CN = {}
# Sales in USD as of 5/15/2022
sales_CN['The Heron\'s Court'] = 19897071
sales_CN['Azure Excursion + Ballad in Goblets'] = 22767455
sales_CN['Reign of Serenity + Drifting Luminescence'] = 33560259
sales_CN['Everbloom Violet'] = 15110264
sales_CN['Gentry of Hermitage + Adrift in the Harbor'] = 26780298
sales_CN['Invitation to Mundane Life + The Transcendant One Returns'] = 16994406
sales_CN['Oni\'s Royale'] = 13404072
sales_CN[' Secretum Secretorum + Born of Ocean Swell'] = 17026066
sales_CN['Moment of Bloom'] = 25226952
sales_CN = pd.DataFrame(sales_CN.items(), columns = ['Banner','Sales China iOS (in USD)'])
print('China')
sales_CN
China
| Banner | Sales China iOS (in USD) | |
|---|---|---|
| 0 | The Heron's Court | 19897071 |
| 1 | Azure Excursion + Ballad in Goblets | 22767455 |
| 2 | Reign of Serenity + Drifting Luminescence | 33560259 |
| 3 | Everbloom Violet | 15110264 |
| 4 | Gentry of Hermitage + Adrift in the Harbor | 26780298 |
| 5 | Invitation to Mundane Life + The Transcendant ... | 16994406 |
| 6 | Oni's Royale | 13404072 |
| 7 | Secretum Secretorum + Born of Ocean Swell | 17026066 |
| 8 | Moment of Bloom | 25226952 |
# Now we are going to make plots for Japan and China sales data versus the MyAnimeList favorite rating for their seiyuus.
# Double banners will have the voice actors' favorites summed together.
# For example: Azure Excursion + Ballad in Goblets would have Akira Ishida and Ayumu Murase's favorite ratings summed together on the x axis.
# Sales will always be on the y axis
plot_data = pd.DataFrame(columns = ['Banner', '#Seiyuu Favorites'], index = range(0,9))
plot_data['Banner'] = sales_CN['Banner']
plot_data['Characters'] = ['Kamisato Ayaka', 'Kamisato Ayato + Venti', 'Raiden Shogun + Sangonomiya Kokomi', 'Yae Miko','Zhongli + Ganyu',
'Xiao + Shenhe', 'Arataki Itto', 'Albedo + Eula', 'Hu Tao']
seiyuu_df['#MAL Favorites'] = seiyuu_df['#MAL Favorites'].str.replace(',','')
#print(seiyuu_df)
plot_data.loc[0,'#Seiyuu Favorites'] = int(seiyuu_df.loc[13,'#MAL Favorites'])
plot_data.loc[1,'#Seiyuu Favorites'] = int(seiyuu_df.loc[11,'#MAL Favorites']) + int(seiyuu_df.loc[12,'#MAL Favorites'])
plot_data.loc[2,'#Seiyuu Favorites'] = int(seiyuu_df.loc[9,'#MAL Favorites']) + int(seiyuu_df.loc[10,'#MAL Favorites'])
plot_data.loc[3,'#Seiyuu Favorites'] = int(seiyuu_df.loc[8,'#MAL Favorites'])
plot_data.loc[4,'#Seiyuu Favorites'] = int(seiyuu_df.loc[6,'#MAL Favorites']) + int(seiyuu_df.loc[7,"#MAL Favorites"])
plot_data.loc[5,'#Seiyuu Favorites'] = int(seiyuu_df.loc[5,'#MAL Favorites']) + int(seiyuu_df.loc[4,'#MAL Favorites'])
plot_data.loc[6,'#Seiyuu Favorites'] = int(seiyuu_df.loc[3,'#MAL Favorites'])
plot_data.loc[7,'#Seiyuu Favorites'] = int(seiyuu_df.loc[1,'#MAL Favorites']) + int(seiyuu_df.loc[2,'#MAL Favorites'])
plot_data.loc[8,'#Seiyuu Favorites'] = int(seiyuu_df.loc[0,'#MAL Favorites'])
plot_data['Japan Sales in USD'] = sales_JP['Sales Japan (in USD)']
plot_data['China Sales in USD'] = sales_CN['Sales China iOS (in USD)']
#print(plot_data)
# Keep a column of combined iOS sales in Japan and China
plot_data['China and Japan Combined Sales'] = 0
for index, row in plot_data.iterrows():
plot_data.loc[index,'China and Japan Combined Sales'] += row[3] + row[4]
plot_data
| Banner | #Seiyuu Favorites | Characters | Japan Sales in USD | China Sales in USD | China and Japan Combined Sales | |
|---|---|---|---|---|---|---|
| 0 | The Heron's Court | 53529 | Kamisato Ayaka | 17486700 | 19897071 | 37383771 |
| 1 | Azure Excursion + Ballad in Goblets | 16585 | Kamisato Ayato + Venti | 24647700 | 22767455 | 47415155 |
| 2 | Reign of Serenity + Drifting Luminescence | 41110 | Raiden Shogun + Sangonomiya Kokomi | 15607900 | 33560259 | 49168159 |
| 3 | Everbloom Violet | 17141 | Yae Miko | 13344100 | 15110264 | 28454364 |
| 4 | Gentry of Hermitage + Adrift in the Harbor | 7079 | Zhongli + Ganyu | 18433800 | 26780298 | 45214098 |
| 5 | Invitation to Mundane Life + The Transcendant ... | 42093 | Xiao + Shenhe | 18071900 | 16994406 | 35066306 |
| 6 | Oni's Royale | 593 | Arataki Itto | 8246700 | 13404072 | 21650772 |
| 7 | Secretum Secretorum + Born of Ocean Swell | 3635 | Albedo + Eula | 12835900 | 17026066 | 29861966 |
| 8 | Moment of Bloom | 45055 | Hu Tao | 19003600 | 25226952 | 44230552 |
#constructing plot showing correlation between sales data in china & japan vs rolls for our character banner dataset
newcount = [578, 547, 388, 431, 594, 515, 561, 538, 454]
df = pd.DataFrame(newcount)
annotations = plot_data['Characters'].to_numpy()
X_Plots = df[0].to_numpy()
Y_Plots = plot_data['China and Japan Combined Sales'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 500, color = "red")
plt.xlabel("Rolls on Each Banner")
plt.ylabel("Sales Data in China and Japan (USD)")
plt.title("Sales Data in China and Japan (USD) vs Rolls for the Character of Banner ",fontsize=40)
for i, label in enumerate(annotations):
plt.annotate(label, (X_Plots[i], Y_Plots[i]))
plt.show()
Here is a graph that displays the rolls on each banner and how it compares to the sales data in China and Japan(USD). This data is however undesireable because common sense would tell us that more rolls on a banner should lead to more sales. There is a clear discrepancy between our data and the total sales data which is likely due to not having enough data. Therefore this line of regression is unrepresentative of what the data should be like.
Twitter was our social media platform of choice when it came to scraping the so called drip marketing post data. We chose to use English Twitter because there was the most drip marketing posts available. The goal here is to be able to get a general idea of a character's popularity based on the number of likes and retweets on their intial reveal post in which we value by adding these two quantities together. This way, we would later be able to use this data for analyis.
Two minor drawbacks here were that none of us are certified on the Twitter Developer Platform to scrape the data straight off of Twitter using tweepy and that not all drip marketing posts were available on Genshin Impact's Twitter. One of us applied to become a developer, but Twitter did not respond in time, so what we ended up doing was manually setting the twitter data of the characters drip market posts into an excel file and scraping the data from there. Yes, we quite literally scrolled through every Twitter post to gather information. It is accurate as of 5/14/2022.
As pointed out several times in this tutorial, Hoyoverse often deletes data that is more than 6 months recent. Any data that exists for more than 6 months in the past may be inconsistent for the sake of calculations. Hence, for those characters who did not have a drip market post available, we marked all of their columns as missing ('NaN'). We also collected the drip market data for a character that is not out yet by the name of Yelan which we will use for data analysis later on. Yelan was due to be Genshin Impact's newest 5* promotional character out in the version 2.7 update, but she has not come out yet due to Shanghai COVID lockdown delaying the release of the update. The update was actually supposed to be on May 10, 2022 but Hoyoverse was forced into extending The Heron's Court wish banner until further notice.
#extract genshin 'drip market' twitter data that is available
t = pd.read_excel('twitterdata.xlsx')
twitter = pd.DataFrame(t, columns = ['Character', 'Date', 'Likes', 'Retweets', 'Post Activity'])
twitter['Post Activity'] = twitter['Likes'] + twitter['Retweets']
twitter
| Character | Date | Likes | Retweets | Post Activity | |
|---|---|---|---|---|---|
| 0 | Hu Tao | NaT | NaN | NaN | NaN |
| 1 | Eula | 2021-05-11 | 63503.0 | 6873.0 | 70376.0 |
| 2 | Albedo | NaT | NaN | NaN | NaN |
| 3 | Arataki Itto | 2021-10-11 | 188590.0 | 34457.0 | 223047.0 |
| 4 | Shenhe | 2021-11-22 | 207072.0 | 36283.0 | 243355.0 |
| 5 | Xiao | NaT | NaN | NaN | NaN |
| 6 | Zhongli | NaT | NaN | NaN | NaN |
| 7 | Ganyu | NaT | NaN | NaN | NaN |
| 8 | Yae Miko | 2022-12-31 | 343000.0 | 73082.0 | 416082.0 |
| 9 | Raiden Shogun | 2021-07-22 | 88703.0 | 14687.0 | 103390.0 |
| 10 | Sangonomiya Kokomi | 2021-07-22 | 70249.0 | 10448.0 | 80697.0 |
| 11 | Kamisato Ayato | 2022-02-04 | 199244.0 | 45039.0 | 244283.0 |
| 12 | Venti | NaT | NaN | NaN | NaN |
| 13 | Kamisato Ayaka | 2021-06-07 | 103889.0 | 15472.0 | 119361.0 |
| 14 | Yelan | 2022-03-28 | 156659.0 | 24525.0 | 181184.0 |
#creates a new twitter dataframe to be worked with
twitter2 = twitter.dropna()
twitter2.loc[9, 'Character'] = 'Raiden Shogun + Sangonomiya Kokomi'
twitter2.loc[9, 'Post Activity'] = twitter2.loc[9, 'Post Activity'] + twitter2.loc[10, 'Post Activity']
twitter2 = twitter2.drop([10])
twitter2 = twitter2.drop(['Likes'], axis = 1)
twitter2 = twitter2.drop(['Retweets'], axis = 1)
twitter2 = twitter2.astype({'Post Activity': 'int'})
twitter2 = twitter2.reindex([13, 11, 9, 8, 4, 3, 1])
predict = twitter2
twitter2
| Character | Date | Post Activity | |
|---|---|---|---|
| 13 | Kamisato Ayaka | 2021-06-07 | 119361 |
| 11 | Kamisato Ayato | 2022-02-04 | 244283 |
| 9 | Raiden Shogun + Sangonomiya Kokomi | 2021-07-22 | 184087 |
| 8 | Yae Miko | 2022-12-31 | 416082 |
| 4 | Shenhe | 2021-11-22 | 243355 |
| 3 | Arataki Itto | 2021-10-11 | 223047 |
| 1 | Eula | 2021-05-11 | 70376 |
#pie chart creation for the dataframe above
pie = np.array(twitter2['Post Activity'])
labelz = twitter2['Character']
def value(val):
return '{:.0f}%\n({:.0f})'.format(val, np.round(val/100.*pie.sum(),0))
plt.pie(pie, labels = labelz, autopct=value)
plt.show()
Looking at the pie chart of our twitter data above, there's some very interesting things to notice. For starters, Yae Miko seems to be by far the most popular character and Eula & Kamisato Ayaka seem to be the least popular characters. As for Eula & Kamisato Ayaka, they are the two oldest drip market posts in our dataset, so because of this our data is probably skewed as to what Genshin's Twitter following was like at the time of each drip market post. Assuming that their Twitter following has increased with time, the amount of likes & retweets on a post should also increase with time. Also, as supported by the sales data above, Raiden Shogun is the highest revenue character in the game which should correlate to most popular, so her being in the middle of the pack in this dataset is also attributed to her drip marketing post being on the older side. On the flip side of things, Yae Miko is the most popular character here, but she is actually not the newest drip market post, as for example Kamisato Ayato is newer than her, but she still seems to be way more popular than him as he's about average popularity. In conclusion, the date of the drip market post affects the posts likes and retweets.
Sadly, there is no dataset available for Genshin's Twitter following throughout time to scale their followers by date to these values for more accuracy... otherwise, this would be taking place next. This data will still be used to analyze data to check if there's still any possible correlations.
Another way to extract data that we came up with was to create a Google Form survey. We shared this form to about 20 different Discord servers of Genshin players to gather surveyers to fill the data. We made sure we posted it within the Mains servers for all the characters in our dataset to level out any biases. The purpose of this was to get a local sample in relevance to genshin wishes. For each character in our dataset from Hu Tao to Kamisato Ayaka, we ask if the surveyer rolled on their banner or not. If they did, they are prompted to check off as many of the following reasons as to why: Character Design, Gameplay, Meta Relevance, In-Game Lore, & Voice Acting Cast. If not, they don't get asked anything else about that specific character.
This survey was great because we not only get an idea of the % of people that rolled for these characters, but also insight as to the reasons why. It was downloaded as an excel file and the data was then scraped to accurately find these things.
We tried our best to keep this survey sample unbiased by sharing it to every Genshin Impact affiliated public Discord server. For example, simply sharing the data in "Ganyu Mains" would skew the data in favor of Ganyu. To counter this, we had to share the survey across servers dedicated to all characters and some neutral servers such as "Keqing Mains" which is commonly known as KQM.
If you're interested in learning meta-related matters for Genshin Impact, we recommend joining the Keqing Mains Discord and checking out their theorycrafting libraries. Genshin Impact theorycrafting is massive as it often demonstrates the productivity and efficiency of a character's meta.
We have not done a fully meta-based analysis on our project because it is an obvious fact among literally every Genshin player (no matter how much they actually understand the game) you ask that "Ganyu and Hu Tao are the best". This would skew our analysis heavily and we wish to avoid such a predicament.
Here is an example of a "meta" team if you are interested: https://youtu.be/YEMUdnhU7A4
The data from this survey is what was gathered up until 5/15/22.
#extracts data from a survey about genshin wishes we created
t1 = pd.read_excel('Genshin_Form.xlsx')
form = pd.DataFrame(t1, columns = ['Did you wish for Hu Tao?', 'Why did you wish for Hu Tao?',
'Did you wish for Eula?', 'Why did you wish for Eula?',
'Did you wish for Albedo?', 'Why did you wish for Albedo?',
'Did you wish for Arataki Itto?', 'Why did you wish for Arataki Itto?',
'Did you wish for Shenhe?', 'Why did you wish for Shenhe?',
'Did you wish for Xiao?', 'Why did you wish for Xiao?',
'Did you wish for Zhongli?', 'Why did you wish for Zhongli?',
'Did you wish for Ganyu?', 'Why did you wish for Ganyu?',
'Did you wish for Yae Miko?', 'Why did you wish for Yae Miko?',
'Did you wish for Raiden Shogun?', 'Why did you wish for Raiden Shogun?',
'Did you wish for Sangonomiya Kokomi?', 'Why did you wish for Sangonomiya Kokomi?',
'Did you wish for Kamisato Ayato?', 'Why did you wish for Kamisato Ayato?',
'Did you wish for Venti?', 'Why did you wish for Venti?',
'Did you wish for Kamisato Ayaka?', 'Why did you wish for Kamisato Ayaka?'])
form
| Did you wish for Hu Tao? | Why did you wish for Hu Tao? | Did you wish for Eula? | Why did you wish for Eula? | Did you wish for Albedo? | Why did you wish for Albedo? | Did you wish for Arataki Itto? | Why did you wish for Arataki Itto? | Did you wish for Shenhe? | Why did you wish for Shenhe? | ... | Did you wish for Raiden Shogun? | Why did you wish for Raiden Shogun? | Did you wish for Sangonomiya Kokomi? | Why did you wish for Sangonomiya Kokomi? | Did you wish for Kamisato Ayato? | Why did you wish for Kamisato Ayato? | Did you wish for Venti? | Why did you wish for Venti? | Did you wish for Kamisato Ayaka? | Why did you wish for Kamisato Ayaka? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Yes | Character Design, Voice Acting Cast | Yes | Character Design, Gameplay | No | NaN | No | NaN | No | NaN | ... | No | NaN | Yes | Character Design, Gameplay, Voice Acting Cast | Yes | Character Design, Gameplay, In-Game Lore, Voic... | Yes | Meta Relevance | Yes | Character Design, Gameplay, Meta Relevance, In... |
| 1 | Yes | Character Design, Gameplay, Meta Relevance, Vo... | No | NaN | No | NaN | No | NaN | No | NaN | ... | Yes | Character Design, Meta Relevance | No | NaN | No | NaN | Yes | Character Design, Gameplay, Meta Relevance | Yes | Character Design, Gameplay |
| 2 | Yes | Gameplay, Meta Relevance | Yes | Gameplay, Meta Relevance | No | NaN | No | NaN | Yes | Character Design, Gameplay, Meta Relevance | ... | Yes | Character Design, Gameplay, Meta Relevance | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | No | NaN | Yes | Gameplay, Meta Relevance |
| 3 | Yes | Character Design, Gameplay, Voice Acting Cast | Yes | Character Design, Gameplay | No | NaN | No | NaN | Yes | Character Design | ... | Yes | Character Design, Gameplay, Meta Relevance, In... | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | No | NaN | No | NaN |
| 4 | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | No | NaN | ... | Yes | Character Design, Gameplay, Meta Relevance, In... | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | Yes | In-Game Lore | Yes | Character Design, Gameplay, Meta Relevance, In... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 97 | Yes | Gameplay, Voice Acting Cast | Yes | Character Design, Gameplay, Voice Acting Cast | Yes | Gameplay | No | NaN | Yes | Character Design, Gameplay, Voice Acting Cast | ... | Yes | Character Design, Gameplay, Meta Relevance, In... | Yes | Gameplay, Meta Relevance | Yes | Gameplay, In-Game Lore, Voice Acting Cast | Yes | Gameplay, In-Game Lore | Yes | Character Design, Gameplay, Meta Relevance, In... |
| 98 | No | NaN | No | NaN | No | NaN | No | NaN | No | NaN | ... | No | NaN | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | No | NaN | Yes | Character Design |
| 99 | Yes | Character Design, Gameplay, Meta Relevance, Vo... | No | NaN | No | NaN | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | ... | No | NaN | No | NaN | Yes | Character Design, Gameplay | No | NaN | Yes | Character Design, Gameplay, Meta Relevance, In... |
| 100 | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN | No | NaN | No | NaN | Yes | Character Design, Gameplay, In-Game Lore | ... | Yes | Character Design, Gameplay, Meta Relevance, In... | Yes | Character Design, Gameplay, Meta Relevance | No | NaN | No | NaN | Yes | Character Design, Gameplay, Meta Relevance, In... |
| 101 | Yes | Character Design | Yes | Character Design | Yes | Character Design | No | NaN | Yes | In-Game Lore | ... | Yes | Character Design, Meta Relevance | Yes | Character Design, In-Game Lore | No | NaN | Yes | Character Design, Gameplay, Meta Relevance, In... | No | NaN |
102 rows × 28 columns
#tracks the counts for each possible selection in the survey per character
counterHuTao = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Hu Tao?']:
counterHuTao[c] += 1
whyWish = form['Why did you wish for Hu Tao?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterHuTao[i] += 1
counterEula = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Eula?']:
counterEula[c] += 1
whyWish = form['Why did you wish for Eula?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterEula[i] += 1
counterAlbedo = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Albedo?']:
counterAlbedo[c] += 1
whyWish = form['Why did you wish for Albedo?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterAlbedo[i] += 1
counterAratakiItto = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Arataki Itto?']:
counterAratakiItto[c] += 1
whyWish = form['Why did you wish for Arataki Itto?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterAratakiItto[i] += 1
counterShenhe = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Shenhe?']:
counterShenhe[c] += 1
whyWish = form['Why did you wish for Shenhe?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterShenhe[i] += 1
counterXiao = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Xiao?']:
counterXiao[c] += 1
whyWish = form['Why did you wish for Xiao?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterXiao[i] += 1
counterZhongli = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Zhongli?']:
counterZhongli[c] += 1
whyWish = form['Why did you wish for Zhongli?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterZhongli[i] += 1
counterGanyu = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Ganyu?']:
counterGanyu[c] += 1
whyWish = form['Why did you wish for Ganyu?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterGanyu[i] += 1
counterYaeMiko = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Yae Miko?']:
counterYaeMiko[c] += 1
whyWish = form['Why did you wish for Yae Miko?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterYaeMiko[i] += 1
counterRaidenShogun = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Raiden Shogun?']:
counterRaidenShogun[c] += 1
whyWish = form['Why did you wish for Raiden Shogun?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterRaidenShogun[i] += 1
counterSangonomiyaKokomi = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Sangonomiya Kokomi?']:
counterSangonomiyaKokomi[c] += 1
whyWish = form['Why did you wish for Sangonomiya Kokomi?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterSangonomiyaKokomi[i] += 1
counterKamisatoAyato = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Kamisato Ayato?']:
counterKamisatoAyato[c] += 1
whyWish = form['Why did you wish for Kamisato Ayato?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterKamisatoAyato[i] += 1
counterVenti = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Venti?']:
counterVenti[c] += 1
whyWish = form['Why did you wish for Venti?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterVenti[i] += 1
counterKamisatoAyaka = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Kamisato Ayaka?']:
counterKamisatoAyaka[c] += 1
whyWish = form['Why did you wish for Kamisato Ayaka?']
whyWish = whyWish.dropna()
for c in whyWish:
list = c.split(', ')
for i in list:
counterKamisatoAyaka[i] += 1
design = 'Character Design'
gp = 'Gameplay'
meta = 'Meta Relevance'
lore = 'In-Game Lore'
va = 'Voice Acting Cast'
popularity = {'Character':['Hu Tao', 'Eula', 'Albedo', 'Arataki Itto', 'Shenhe', 'Xiao', 'Zhongli', 'Ganyu', 'Yae Miko', 'Raiden Shogun',
'Sangonomiya Kokomi', 'Kamisato Ayato', 'Venti', 'Kamisato Ayaka'],
'Wished For':[counterHuTao['Yes'], counterEula['Yes'], counterAlbedo['Yes'], counterAratakiItto['Yes'],
counterShenhe['Yes'], counterXiao['Yes'], counterZhongli['Yes'], counterGanyu['Yes'],
counterYaeMiko['Yes'], counterRaidenShogun['Yes'], counterSangonomiyaKokomi['Yes'],
counterKamisatoAyato['Yes'], counterVenti['Yes'],counterKamisatoAyaka['Yes']],
'Not Wished For':[counterHuTao['No'], counterEula['No'], counterAlbedo['No'], counterAratakiItto['No'],
counterShenhe['No'], counterXiao['No'], counterZhongli['No'], counterGanyu['No'],
counterYaeMiko['No'], counterRaidenShogun['No'], counterSangonomiyaKokomi['No'],
counterKamisatoAyato['No'], counterVenti['No'],counterKamisatoAyaka['No']],
'Wished for Character Design':[counterHuTao[design], counterEula[design], counterAlbedo[design], counterAratakiItto[design],
counterShenhe[design], counterXiao[design], counterZhongli[design], counterGanyu[design],
counterYaeMiko[design], counterRaidenShogun[design], counterSangonomiyaKokomi[design],
counterKamisatoAyato[design], counterVenti[design],counterKamisatoAyaka[design]],
'Wished for Gameplay':[counterHuTao[gp], counterEula[gp], counterAlbedo[gp], counterAratakiItto[gp],
counterShenhe[gp], counterXiao[gp], counterZhongli[gp], counterGanyu[gp],
counterYaeMiko[gp], counterRaidenShogun[gp], counterSangonomiyaKokomi[gp],
counterKamisatoAyato[gp], counterVenti[gp],counterKamisatoAyaka[gp]],
'Wished for Meta Relevance':[counterHuTao[meta], counterEula[meta], counterAlbedo[meta], counterAratakiItto[meta],
counterShenhe[meta], counterXiao[meta], counterZhongli[meta], counterGanyu[meta],
counterYaeMiko[meta], counterRaidenShogun[meta], counterSangonomiyaKokomi[meta],
counterKamisatoAyato[meta], counterVenti[meta],counterKamisatoAyaka[meta]],
'Wished for In-Game Lore':[counterHuTao[lore], counterEula[lore], counterAlbedo[lore], counterAratakiItto[lore],
counterShenhe[lore], counterXiao[lore], counterZhongli[lore], counterGanyu[lore],
counterYaeMiko[lore], counterRaidenShogun[lore], counterSangonomiyaKokomi[lore],
counterKamisatoAyato[lore], counterVenti[lore],counterKamisatoAyaka[lore]],
'Wished for Voice Acting Cast':[counterHuTao[va], counterEula[va], counterAlbedo[va], counterAratakiItto[va],
counterShenhe[va], counterXiao[va], counterZhongli[va], counterGanyu[va],
counterYaeMiko[va], counterRaidenShogun[va], counterSangonomiyaKokomi[va],
counterKamisatoAyato[va], counterVenti[va],counterKamisatoAyaka[va]]}
pop = pd.DataFrame(popularity)
pop
| Character | Wished For | Not Wished For | Wished for Character Design | Wished for Gameplay | Wished for Meta Relevance | Wished for In-Game Lore | Wished for Voice Acting Cast | |
|---|---|---|---|---|---|---|---|---|
| 0 | Hu Tao | 63 | 39 | 45 | 39 | 34 | 16 | 27 |
| 1 | Eula | 47 | 55 | 39 | 30 | 14 | 14 | 11 |
| 2 | Albedo | 41 | 61 | 32 | 30 | 5 | 23 | 16 |
| 3 | Arataki Itto | 36 | 66 | 30 | 23 | 6 | 23 | 19 |
| 4 | Shenhe | 31 | 71 | 26 | 13 | 7 | 14 | 7 |
| 5 | Xiao | 40 | 62 | 30 | 32 | 17 | 24 | 17 |
| 6 | Zhongli | 73 | 29 | 53 | 50 | 46 | 50 | 34 |
| 7 | Ganyu | 44 | 58 | 38 | 30 | 31 | 19 | 19 |
| 8 | Yae Miko | 38 | 64 | 35 | 19 | 3 | 26 | 15 |
| 9 | Raiden Shogun | 73 | 29 | 59 | 54 | 48 | 45 | 29 |
| 10 | Sangonomiya Kokomi | 40 | 62 | 30 | 30 | 16 | 22 | 14 |
| 11 | Kamisato Ayato | 41 | 61 | 34 | 33 | 10 | 18 | 17 |
| 12 | Venti | 59 | 43 | 32 | 44 | 36 | 27 | 14 |
| 13 | Kamisato Ayaka | 64 | 38 | 48 | 47 | 39 | 24 | 26 |
#creates a more ideal dataframe which shows us the % of surveyers who wished for each character and the % of those who did as to why
popfinal = {'Character': [], 'Wished %': [], 'Design %': [], 'Gameplay %': [], 'Meta %': [], 'Lore %': [], 'Voice Acting %': []}
popf = pd.DataFrame(popfinal)
popf['Character'] = pop['Character']
popf['Wished %'] = round((pop['Wished For'] / (pop['Wished For'] + pop['Not Wished For']) * 100), 3)
popf['Design %'] = round((pop['Wished for Character Design'] / pop['Wished For'] * 100), 3)
popf['Gameplay %'] = round((pop['Wished for Gameplay'] / pop['Wished For'] * 100), 3)
popf['Meta %'] = round((pop['Wished for Meta Relevance'] / pop['Wished For'] * 100), 3)
popf['Lore %'] = round((pop['Wished for In-Game Lore'] / pop['Wished For'] * 100), 3)
popf['Voice Acting %'] = round((pop['Wished for Voice Acting Cast'] / pop['Wished For'] * 100))
popf
| Character | Wished % | Design % | Gameplay % | Meta % | Lore % | Voice Acting % | |
|---|---|---|---|---|---|---|---|
| 0 | Hu Tao | 61.765 | 71.429 | 61.905 | 53.968 | 25.397 | 43.0 |
| 1 | Eula | 46.078 | 82.979 | 63.830 | 29.787 | 29.787 | 23.0 |
| 2 | Albedo | 40.196 | 78.049 | 73.171 | 12.195 | 56.098 | 39.0 |
| 3 | Arataki Itto | 35.294 | 83.333 | 63.889 | 16.667 | 63.889 | 53.0 |
| 4 | Shenhe | 30.392 | 83.871 | 41.935 | 22.581 | 45.161 | 23.0 |
| 5 | Xiao | 39.216 | 75.000 | 80.000 | 42.500 | 60.000 | 42.0 |
| 6 | Zhongli | 71.569 | 72.603 | 68.493 | 63.014 | 68.493 | 47.0 |
| 7 | Ganyu | 43.137 | 86.364 | 68.182 | 70.455 | 43.182 | 43.0 |
| 8 | Yae Miko | 37.255 | 92.105 | 50.000 | 7.895 | 68.421 | 39.0 |
| 9 | Raiden Shogun | 71.569 | 80.822 | 73.973 | 65.753 | 61.644 | 40.0 |
| 10 | Sangonomiya Kokomi | 39.216 | 75.000 | 75.000 | 40.000 | 55.000 | 35.0 |
| 11 | Kamisato Ayato | 40.196 | 82.927 | 80.488 | 24.390 | 43.902 | 41.0 |
| 12 | Venti | 57.843 | 54.237 | 74.576 | 61.017 | 45.763 | 24.0 |
| 13 | Kamisato Ayaka | 62.745 | 75.000 | 73.438 | 60.938 | 37.500 | 41.0 |
#multiple bar plot creation for dataframe shown above
chars_bar = popf['Character']
wished_bar = popf['Wished %']
design_bar = popf['Design %']
gp_bar = popf['Gameplay %']
meta_bar = popf['Meta %']
lore_bar = popf['Lore %']
va_bar = popf['Voice Acting %']
x_axis = np.arange(len(chars_bar))
f, ax = plt.subplots(figsize=(25,10))
plt.bar(x_axis -0.25, wished_bar, width=0.1, label = 'Wished %')
plt.bar(x_axis -0.15, design_bar, width=0.1, label = 'Design %')
plt.bar(x_axis -0.05, gp_bar, width=0.1, label = 'Gameplay %')
plt.bar(x_axis +0.05, meta_bar, width=0.1, label = 'Meta %')
plt.bar(x_axis +0.15, lore_bar, width=0.1, label = 'Lore %')
plt.bar(x_axis +0.25, va_bar, width=0.1, label = 'Voice Acting %')
plt.xticks(x_axis,chars_bar)
ax.legend(fontsize=10)
plt.show()
Again, looking at the multiple bar graph of our character dataset above, we notice some interesting things. We'll break these down into the different categories shown as follows:
#plot correlating Sales in japan and the likes and retweets of a characters 'drip marketing' post. includes regression line
#and includes predictive point for upcoming character yelan based on her likes + rewteets
import numpy as np
import matplotlib.pyplot as plt
annotate = twitter2['Character'].values
x_data = twitter2['Post Activity'].values
y_data = sales_JP['Sales Japan (in USD)'].values
z = np.polyfit(x = x_data, y = y_data, deg=1)
f = np.poly1d(z)
x_new = np.linspace(x_data.min(), x_data.max(), 500)
y_new = f(x_new)
y_test = f(181184) #181184 - yelan likes & rewteets
plt.figure(figsize = (20,10))
plt.plot(x_data, y_data,'o',x_new,y_new)
plt.scatter(x_data,y_data, s = 100, color = "green", marker ="^")
plt.scatter(181184, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("Likes and Retweets of Banner")
plt.ylabel("Sales Data in Japan (USD)")
plt.title("Sales Data Japan (USD) vs Total of Likes and Retweets of a Character ",fontsize=40)
for i, label in enumerate(annotate):
plt.annotate(label, (x_data[i], y_data[i]))
plt.annotate('Yelan', (181184, y_test))
print(f(181184))
plt.show()
15729667.877619218
Here is a graph on the total likes and retweets of each banner compared to the sales data in Japan(USD). This is not entirely suprising as previously mentioned we scraped English Twitter data for more drip marketing content which in turn did not correlate with sales data in Japan. Therefore, this linear regression is unrepresentative of the game's data.
#plot correlating Sales in china and the likes and retweets of a characters 'drip marketing' post. includes regression line
#and includes predictive point for upcoming character yelan based on her likes + rewteets
annotate = twitter2['Character'].values
x_data1 = twitter2['Post Activity'].values
y_data1 = sales_CN['Sales China iOS (in USD)'].values
z = np.polyfit(x = x_data1, y = y_data1, deg=1)
f = np.poly1d(z)
x_new = np.linspace(x_data1.min(), x_data1.max(), 100)
y_new = f(x_new)
y_test = f(181184) #181184 - yelan likes & rewteets
plt.figure(figsize = (20,10))
plt.plot(x_data1, y_data1,'o',x_new,y_new)
plt.scatter(x_data1,y_data1, s = 100, color = "pink")
plt.scatter(181184, y_test, s = 100, color = "purple")
plt.xlabel("Likes and Retweets of Banner")
plt.ylabel("Sales Data in China (USD)")
plt.title("Sales Data China (USD) vs Total of Likes and Retweets of a Character ",fontsize=40)
for i, label in enumerate(annotate):
plt.annotate(label, (x_data1[i], y_data1[i]))
plt.annotate('Yelan', (181184, y_test))
print(f(181184))
plt.show()
20285617.770686075
This is a graph on the total likes and retweets of each banner compared to the sales data in China(USD). Once again, we scraped from English Twitter to acquire more drip marketing content, so this data does not reflext the actual sales data in China. Therefore, this linear regression is also unrepresentative of the game's data.
#plot correlating Sales in japan and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
annotations = plot_data['Characters'].to_numpy()
X_Plots = plot_data['#Seiyuu Favorites'].to_numpy().astype(str).astype(int)
Y_Plots = plot_data['Japan Sales in USD'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "blue")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("#Seiyuu Favorites")
plt.ylabel("Sales Data in Japan (USD)")
plt.title("Sales Data in Japan (USD) vs Number of Favorites for Seiyuus ",fontsize=40)
for i, label in enumerate(annotations):
plt.annotate(label, xy = (X_Plots[i], Y_Plots[i]), xytext = (X_Plots[i] + 140 , Y_Plots[i] + 140) , ha = 'left')
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
15615379.470786579
This graph portrays the #seiyuu favorites on MyAnimeList commpared with sales data in Japan(USD). We can see that there is a very clear positive correlation between the two variables, so when the #Seiyuu favorites is higher, the sales data in Japan is higher as well. This is a nice representation of the game's data; therefore, we were able to predict Yelan's sales data in Japan value through the linear regression line. The line states that Yelan has 16319 #seiyuu favorites and 15615379.470786579 sales data in Japan(USD).
#plot correlating Sales in china and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
annotations = plot_data['Characters'].to_numpy()
X_Plots = plot_data['#Seiyuu Favorites'].to_numpy().astype(str).astype(int)
Y_Plots = plot_data['China Sales in USD'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "red")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("#Seiyuu Favorites")
plt.ylabel("Sales Data in China (USD)")
plt.title("Sales Data in China (USD) vs Number of Favorites for Seiyuus ",fontsize=40)
for i, label in enumerate(annotations):
plt.annotate(label, xy = (X_Plots[i], Y_Plots[i]), xytext = (X_Plots[i] + 140 , Y_Plots[i] + 140) , ha = 'left')
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
20151637.914954036
This graph portrays the #seiyuu favorites on MyAnimeList commpared with sales data in China(USD). Once again, we can see that there is a very clear positive correlation between the two variables, so when the #Seiyuu favorites is higher, the sales data in China is higher as well. This is a nice representation of the game's data; therefore, we were able to predict Yelan's sales data in China value through the linear regression line. The line states that Yelan has 16319 #seiyuu favorites and 20151637.914954036 sales data in China(USD).
#plot correlating Sales in japan & china and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
annotations = plot_data['Characters']
X_Plots = plot_data['#Seiyuu Favorites'].astype(str).astype(int)
Y_Plots = plot_data['China and Japan Combined Sales']
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "green")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("#Seiyuu Favorites")
plt.ylabel("Sales Data in China and Japan (USD)")
plt.title("Sales Data in China and Japan (USD) vs Number of Favorites for Seiyuus ",fontsize=40)
for i, label in enumerate(annotations):
plt.annotate(label, xy = (X_Plots[i], Y_Plots[i]), xytext = (X_Plots[i] + 140 , Y_Plots[i] + 140) , ha = 'left')
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
35767017.385740615
This graph portrays the #seiyuu favorites on MyAnimeList commpared with sales data in China and Japan(USD). We can see that there is a very clear positive correlation between the two variables, so when the #Seiyuu favorites is higher, the sales data in China and Japan is higher as well. This is a nice representation of the game's data; therefore, we were able to predict Yelan's sales data in China and Japan through the linear regression line. The line states that Yelan has 16319 #seiyuu favorites and 35767017.385740615 sales data in China and Japan(USD).
#plot correlating rolls on our character banner data and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
count1 = [227, 425, 113, 561, 413, 102, 63, 531, 431, 336, 52, 498, 49, 578]
df = pd.DataFrame(count1)
#seiyuu_df = seiyuu_df.drop([14])
seiyuu_df['#MAL Favorites'] = seiyuu_df['#MAL Favorites'].astype(int)
annotations = seiyuu_df['Character'].to_numpy()
Y_Plots = df[0].to_numpy()
X_Plots = seiyuu_df['#MAL Favorites'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "green")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("Rolls on Each Banner")
plt.ylabel("#MAL Favorites")
plt.title("#MAL Favorites vs Rolls for the Character of Banner ",fontsize=40)
for i, label in enumerate(annotations):
plt.annotate(label, (X_Plots[i], Y_Plots[i]))
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
312.93219775491764
This is a graph on rolls on each banner compared to MyAnimeList Favorites. There is a slight positive correlation which matches the survey data we have above since voice acting did not play a major role in rolling for a character, but it still played a role in the game's data; therefore, we can predict Yelan's MAL favorites to be 312.93219775491764 through the regression line.
#plot correlating rolls on our character banner data and the twiter likes and retweets. includes regression line
#and includes predictive point for upcoming character yelan based on her likes + rewteets
count2 = [578, 498, 388, 431, 413, 561, 425]
df = pd.DataFrame(count2)
annotations = twitter2['Character'].to_numpy()
Y_Plots = df[0].to_numpy()
X_Plots = twitter2['Post Activity'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(181184) #181184 - yelan likes & rewteets
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "green")
plt.scatter(181184, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("Rolls on Each Banner")
plt.ylabel("Twitter Likes & Retweets")
plt.title("Twitter Likes & Retweets vs Rolls for the Character of Banner ",fontsize=40)
for i, label in enumerate(annotations):
plt.annotate(label, (X_Plots[i], Y_Plots[i]))
plt.annotate('Yelan', (181184, y_test))
print(f(181184))
plt.show()
474.73909806068013
This is a graph on the rolls on each banner compared to the Twitter likes & retweets. This is an incomplete graph because we scraped from the English Twitter as stated above, so it does not really reflect on the game's data as well. We do not have enough data to support this graph.
Special thanks to our friends Wallace Santos (BR) and Alex Dai (CN) for giving us high quality screen recordings to share